Chromatin segmentation

ABSTRACT

A method of segmenting chromatin particles in a nucleus of a cell by locating regional minima in an image, computing a zone of influence (ZOI) around each regional minimum, and segmenting a single chromatin blob within each ZOI using a region growing procedure. The method can be used as the basis of a method of qualitatively characterizing the distribution of nuclear chromatin by computing features for individual chromatin particles. Chromatin features can be synthesized from the features of individual particles and particle features can be synthesized into nucleus features and slide features. The method is useful for detecting malignancy associated changes and changes during neoplasia. The method may also be used more generally to assess chromatin patterns in living cells during the cell life cycle. This makes it possible to measure alternations in the evolving patterns that result from pathological or environmental influences.

This invention is a method for segmenting (delineating) the chromatin inthe nucleus of a cell. Chromatin is visualized by light microscopy as apatchwork of light and dark regions. The dark regions correspond toareas of high optical density and are blob-like or particle-like inappearance. The light regions correspond to areas of clearing and canalso be interpreted as blobs or particles. In particular, though notexclusively, this invention is a method for segmenting these light anddark particles.

BACKGROUND TO THE INVENTION

The distribution of the chromosomes or DNA in the nuclei of cells can bequantitatively measured using a computer and image analysis techniques.Moreover, these measurements or features can be used to detect bothmalignancy associated changes (MACs), and changes during neoplasia. Thefeatures that appear to have the most discriminatory power are texturefeatures. Such features quantitatively describe the intensity variationof the chromatin (the substance that constitutes the chromosomes or DNAand is readily visualized by staining) in the nucleus of a cell. Themost widely used chromatin texture features are based on a statisticalor probabilistic assessment of the gray-levels (intensity levels oroptical density levels) in the cell nucleus. Unfortunately thesefeatures are difficult to relate to the terms and adjectives (e.g.granularity, compaction, margination, clumping, chromatin particles,condensation) used by cytologists to describe chromatin texture.Moreover they are usually defined at the pixel level and therefore failto take into account the structural aspect of the chromatindistribution; e.g. this is true of all but the discrete texture featuresdescribed in the United States Patent of Palcic et al. [System andmethod for automatically detecting malignant cells and cells havingmalignancy-associated changes; U.S. Pat. No. 6,026,174 dated Feb. 15,2000]. Therefore their efficacy in relation to quantifying chromatindistribution is questionable.

An alternative approach to computing chromatin texture features is tosegment the chromatin into aggregates and then to synthesize chromatinfeatures from quantitative features computed for these aggregates. Thisapproach has two advantages: (i) the segmentation step introducesstructural information, and (ii) the synthesized features can be relatedto qualitative descriptions of chromatin texture made by cytologists.The key to this approach is the segmentation step. Several differentmethods of chromatin segmentation have been published in the literature.A characteristic they have in common is that they require the a priorispecification of one or more operational parameters such as thresholdvalues and region merging criteria. Moreover these parameters need to betuned to the particular application. As a consequence these methods arenot robust to changes in, or non-uniformity of, illumination andstaining. The quality of the segmentations produced is arguably poor.This in turn affects the quality of the chromatin features that arecomputed from such segmentations. This is likely one of the majorreasons that such features, with the exception of those of Young,Verbeek, and Mayall [Characterization of chromatin distribution in cellnuclei; Cytometry; vol. 7; 1986; pp. 467-474], have not found widespreaduse. Another possible reason is that the software implementation of thesegmentation step is complicated.

Three existing methods of chromatin segmentation deserve specialattention. The first is the method of Young, Verbeek, and Mayall becauseit is the basis of the discrete texture features detailed in theaforementioned United States Patent of Palcic et al. The second is themethod of Wolf, Beil, and Guski [Chromatin structure analysis based on ahierarchic texture model; Analytical and Quantitative Cytology andHistology; vol. 17; no. 1; 1995; pp. 25-34] because it utilizes thewatershed transform as does the preferred embodiment of the presentinvention. The third is the method of Kondo and Taniguchi [Evaluation ofthe chromatin for cell images; Systems and Computers in Japan; vol. 17;no. 9; 1986; pp. 11-19] because it represents the closest known priorart to the present invention.

The Young, Verbeek, and Mayall (YVM) method of chromatin segmentationtakes as input a digitized image of a cell nucleus visualized by lightmicroscopy. Young, Verbeek, and Mayall illustrate their method on imagesobtained from foam cells in human nipple aspirate fluid, and raturothelial cells. The method of staining is unspecified. The YVMchromatin segmentation method involves nothing more than partitioningthe gray-level histogram of the nucleus image into three parts (i.e.choosing two gray levels), and then using this division to label eachnucleus pixel. The result is a segmentation comprising regions of low,medium, and high optical density. The manner in which the partitionpoints (threshold values) are determined must be a priori specified.Although the YVM segmentation method is simple (hence its relativepopularity) the quality of the segmentation is questionable for thefollowing two reasons. Firstly, the method of segmentation utilizes onlythe intensity histogram and does not take into account any spatialinformation. Secondly, the method requires the specification of twothreshold values. The manner in which these are chosen must be a priorispecified. Moreover they must be tuned to the particular application.

The Wolf, Beil, and Guski (WBG) method of chromatin segmentation takesas input a digitized image of a cell nucleus, visualized by lightmicroscopy, from a cervical tissue section obtained by colposcopicbiopsy and stained by the Feulgen method. The WBG method consists of twosteps. The first step involves determining the watershed of the gradientof the input image. This is done using a modification of the classicwatershed algorithm of Vincent and Soille [Watersheds in digital spaces:an efficient algorithm based on immersion simulations; IEEE Transactionson Pattern Analysis and Machine Intelligence; vol. 13; no. 6; 1991; pp.583-598]. The result is an oversegmentation; i.e. too many regions aredelineated and as a consequence the result does not correspond very wellto the chromatin patches in the original image. The second step involvesselectively merging the regions segmented in the first step.Specifically, this step involves fitting a plane to each segmentedregion using standard least-squares techniques and then iterativelymerging neighboring regions based on merging criteria related to thestandard deviation of gray-levels in the regions. The decision to mergetwo regions is based on the evaluation of a single parameter that isthen compared to a threshold value. This is a drawback of the methodbecause the manner in which this threshold value is determined must be apriori specified.

The Kondo and Taniguchi (KT) method of chromatin segmentation representsthe closest known prior art to the present invention. The method takesas input a digitized image of a cell nucleus, visualized by lightmicroscopy, from a Pap smear. The method comprises three steps: (i)local maxima (with respect to optical density) are located in the inputimage (these correspond to local minima of intensity), (ii) the inputimage is partitioned into sub-images (regions), each containing a singlemaximum, and (iii) a chromatin granule (densely stained blob ofchromatin) is segmented from each region in turn using local adaptivethresholding. Kondo and Taniguchi propose three different methods forthe partitioning step: (i) partitioning using a Voronoi neighborhood,(ii) region partitioning by directed tree, and (iii) area expansion bydifference direction. A drawback of the Voronoi neighborhood method isthat it does not use the topography of the input image to determine aregion around each minimum. Consequently it is possible that the regiondetermined around a minimum cuts through one or more adjacent chromatinparticles. A drawback of the directed tree method is that it isnecessary to a priori select a sensitivity parameter to control growth.A drawback of the density difference method is that the growth is notprescribed by geodesic distance (i.e. if the image is viewed as alandscape then the growth is not prescribed by the topography of thelandscape). The local adaptive thresholding method of segmenting agranule from each region has several potential drawbacks includingsensitivity to noise and non-uniform illumination, and the need toprescribe the manner in which the threshold value is determined.

The present invention is specifically designed for the purpose ofsegmenting chromatin particles in the nucleus of a cell. The methodtakes as input an image of the nucleus of a cell. Consequently the taskof segmenting a cell from a field of cells and the task of segmentingthe nucleus from a single cell, are not the subjects of this invention.Indeed details of these tasks are described in International PatentApplication number PCT/AU01/00787 (WO 02/03331) and co-pendingInternational Patent Application number PCT/AU99/00231 (WO 99/52074)respectively.

OBJECT OF THE INVENTION

It is an object of the present invention to provide a means ofsegmenting chromatin in a nucleus of a cell.

SUMMARY OF THE INVENTION

In one form, although it need not be the only, or indeed the broadestform, the invention resides in a method of segmenting chromatinparticles in the nucleus of a cell including the following steps of:

-   (i) locating regional minima in the image;-   (ii) computing a zone of influence (ZOI) around each regional    minimum; and-   (iii) segmenting a single chromatin blob within each ZOI using a    region growing procedure.

The input image is preferably a two-dimensional gray-scale imagecomprising only those pixels that define the nucleus of a cell. It willbe appreciated that the method is not limited to two-dimensionalgray-scale images. The method can be applied to three-dimensional imagesin which case regional minima will be sets of voxels rather than pixels.The method can also be applied to multi-valued (including multispectral)images.

The method may further include the step of evaluating the contrast ofeach regional minimum and discarding those regional minima that do notsatisfy a priori specified contrast criteria.

The method may also include a preliminary step of pre-processing theinput image to correct for degradations. The pre-processing stepsuitably removes degradations such as noise and blurring. Thepre-processed image may optionally be up-sampled.

The regional minima are suitably regions of constant gray-value that aresurrounded by pixels of strictly higher (lighter) gray-value. Eachregional minimum identifies the location of a dark blob.

The method may also be applied to the photographic negative of an imageto identify the locations of light blobs. This is equivalent toidentifying regional maxima in the original (positive) image.

In preference, the step of computing a zone of influence is performed bymeans of seeded region growing, or catchment basins determined by thewatershed transform, or influence zones (IZs) with respect to an apriori specified metric.

The segmentation step is suitably performed by means of either awatershed transform or a seeded region growing algorithm.

In a further form, the invention resides in a method of quantitativelycharacterizing the structure of nuclear chromatin including-the stepsof:

-   (i) obtaining an image of a cell nucleus showing chromatin texture;-   (ii) locating regional minima in the image;-   (iii) computing a zone of influence (ZOI) around each regional    minimum;-   (iv) segmenting a single chromatin blob within each ZOI using a    region growing procedure; and-   (v) computing features for individual chromatin blobs.

The method may further include the step of:

-   (vi) synthesizing chromatin features from the features computed for    individual chromatin blobs.

The method may suitably include the further step of repeating steps (ii)to (v) for the negative of the image of step (i) and using the computedfeatures in step (vi).

The method may further include the step of evaluating the contrast ofeach regional minimum in the image of the cell nucleus and discardingthose regional minima that do not satisfy a priori specified contrastcriteria.

The image of a cell nucleus may be a suitable gray-scale image.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred embodiment of the invention is described with reference tothe following figures in which:

FIG. 1 is a flowchart showing the steps of the invention;

FIG. 2 is an example input to the present invention;

FIG. 3 shows the result after the application of a 3×3 median filter toFIG. 2 and then up-sampling by factor 2;

FIG. 4 shows the location of the regional minima of FIG. 3 (depicted assmall white regions or connected components);

FIG. 5 shows the watershed lines (white lines) produced by theapplication of the watershed transform to the image in FIG. 3 using thewhite regions of FIG. 4 as markers;

FIG. 6 shows the watershed lines (white lines) produced by theapplication of the watershed transform to the morphological gradient ofthe image in FIG. 3 using the white regions of FIG. 4 and the whitelines of FIG. 5 as markers. The white lines in this figure delineate thechromatin particles defined by the topography of the gradient image;

FIG. 7 shows the boundary lines (white lines) produced by theapplication of a seeded region growing algorithm to the image in FIG. 3using the white regions of FIG. 4 and the white lines of FIG. 5 asseeds. The white lines in this figure delineate chromatin particlesdefined by the homogeneity criteria of the seeded region growingalgorithm (in this case the homogeneity of gray values);

FIG. 8 shows the negative of the image in FIG. 2;

FIG. 9 shows the segmentation produced when FIG. 8, rather than FIG. 2,is used as the input to the preferred embodiment of the presentinvention;

FIG. 10 shows examples of the intermediate images used to compute thepreferred features of a particle;

FIG. 11 shows a Delaunay graph for the dark particles of FIG. 10;

FIG. 12 shows a cytology slide, a magnified image of a particular fieldon the slide, and a magnified image of one of the nuclei from this fieldof view; and

FIG. 13 shows a small gallery of nuclei for which the dark particleshave been segmented.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The preferred embodiment of the present invention is similar to the KTmethod in the sense that it involves the steps of locating minima in theintensity image, partitioning the image into regions each containing asingle minimum, and then segmenting a single blob in each region. Incontrast to the KT method the present invention locates regional minima(with respect to intensity) rather than local minima. A regional minimumis a region or connected component of constant gray-value that issurrounded by pixels of strictly higher (lighter) gray-value, whereas alocal minimum is a pixel whose neighbors are of strictly highergray-value. In contrast to the KT method the present invention, in itspreferred embodiment, grows these minima using the watershed transform.If the input image is viewed as a topographic landscape (with the lightpixels representing high points and the dark pixels -representing lowpoints), then the boundaries of the regions determined by the watershedtransform are guaranteed to lie between the minima as determined by thetopography of the surface. In contrast to the KT method the presentinvention, in its preferred embodiment, segments a single blob in eachregion using the watershed transform.

FIG. 1 is a flowchart of the steps in the preferred embodiment of thepresent invention. The input image is denoted by the letter I on theflowchart. It is a gray-scale image in which the chromatin is visualizedas a patchwork of light and dark regions. The darker the region, themore optically dense is the chromatin. FIG. 2 is an example of such animage. The white area surrounding the gray area is not considered to bepart of the image. The first optional step is pre-processing. Dependingon the quality of I, this step may include filtering to attenuate noise,deconvolution to correct for blur, and up-sampling to facilitate linerendering in subsequent steps. The preferred method of pre-processinginvolves the application of a 3×3 median filter [Gonzalez & Woods;Digital image processing; Addison-Wesley; Reading, Mass.; ISBN:0-201-50803-6; 1992; pp. 191-195] followed by up-sampling by factor 2.The preferred method of up-sampling by factor 2 involves nothing morethan replacing each pixel with a 2×2 block of pixels of the samegray-value. FIG. 3 shows the result after applying the preferred methodof pre-processing to the image I of FIG. 2. The pre-processed image isdenoted I′ on the flowchart.

The next step is to locate the regional minima in the pre-processedimage I′. A regional minimum is a region or connected component,comprising one or more pixels of constant gray value, that is surroundedby pixels of strictly higher (lighter) gray value. The output from thisstep is a binary image M containing connected components, each of whichmarks the location of a regional minimum. Several methods for detectingregional minima exist in the literature; e.g. the grayscalereconstruction method of Vincent [Morphological grayscale reconstructionin image analysis: applications and efficient algorithms; IEEETransactions on Image Processing; vol. 2; no. 2; April 1993; pp.184-185]. The present invention does not require that any particularmethod be employed. FIG. 4 shows the location of the regional minima ofthe pre-processed image I′ of FIG. 3 (the image M consists of the smallwhite connected components).

As mentioned earlier, the preferred embodiment is described by referenceto two-dimensional gray-scale images. It will be appreciated that theprocess may be generalized to locate sets of voxels in athree-dimensional image rather than sets of pixels in a two-dimensionalimage.

The next optional step is to filter the image M according to a priorispecified contrast criteria. This involves computing a contrastvaluation, with respect to the pre-processed image I′, for each regionalminimum, and then discarding those minima that do not satisfy thecontrast criteria. The filtered image is denoted M′ on the flowchart.Two particularly useful contrast measures are dynamics devised byGrimaud [A new measure of contrast: the dynamics; Proceedings of theSPIE—The International Society for Optical Engineering; vol. 1769; 1992;pp. 292-305] and symmetrical dynamics devised by Vachier and Vincent[Valuation of image extrema using alternating filters by reconstruction;Proceedings of the SPIE—The International Society for OpticalEngineering; vol. 2568; 1995; pp. 94-103]. The preferred measure ofcontrast is dynamics. The preferred contrast criterion is to select allregional minima with a dynamics value of 1 or greater. Effectively thispreferred criterion leads to all of the connected components of M beingretained; i.e. M′=M. Whilst from a segmentation point of view thepreferred contrast criterion has no effect on the final segmentation,the computed dynamics values do provide a set of valuable features fromwhich chromatin features can subsequently be computed (an intendedapplication of the present invention).

The next step is to compute a zone of influence (ZOI) around each of theconnected components in M′. The image containing the ZOIs is denoted byZ on the flowchart. Depending on the method of implementation, Z may beeither a binary image of lines delineating the zones of influence or itmay be a gray-scale image in which each ZOI has its own unique numericallabel. The preferred method for computing Z is to apply the watershedtransform [Beucher & Meyer; The morphological approach to segmentation:the watershed transformation; in Mathematical morphology in imageprocessing; E. R. Dougherty (Ed.); chapter 12; Marcel Dekker, Inc., NewYork; ISBN: 0-8247-8724-2; 1993; pp. 433-481] to the pre-processed imageI′ using the connected components of M′ as markers. Another possibilityis to apply a scan-order-independent seeded region growingalgorithm—such as that of the inventors, Mehnert and Jackway [Animproved seed region growing algorithm; Pattern Recognition Letters;vol. 18; 1997; pp. 1065-1071] or that of Beare and Talbot [Exact seededregion growing for image segmentation; Proceedings of the FifthInternational/National Biennial Conference on Digital Image Computing,Techniques, and Applications (DICTA'99); 7-8 December, Perth, WesternAustralia; 1999; pp. 132-137]—the pre-processed image I′ using theconnected components of M′ as seeds. Yet another possibility is todetermine the influence zone (IZ) around each connected component of M′using an a priori specified metric. The IZ of a connected component isthe set of all pixels that are closer to it than to any other connectedcomponent. FIG. 5 shows the result (white lines) of the preferredmethod; i.e. the application of the watershed transform to image I′ ofFIG. 3 using the connected components M′ of FIG. 4 as markers.

The next step is to segment a single blob (chromatin particle) withineach ZOI in Z. This is done using a region growing procedure. Thepreferred method for doing this is to apply the watershed transform tothe modulus of the gradient of the pre-processed image I′ using both theconnected components of M′ and the boundary lines of Z as markers (theseboundary lines may or may not include the boundary of the entire nucleusitself; the preferred method is not to include this). Numerous methodsfor computing the modulus of the gradient exist in the literature. Thepreferred method is the discrete morphological gradient [Rivest, Soille,& Beucher; Morphological gradients; Journal of Electronic Imaging; vol.2; no. 4; 1993; pp. 326-336]. Another possibility for segmenting thechromatin blobs is to apply a scan-order-independent seeded regiongrowing algorithm to the pre-processed image I′ using both the connectedcomponents of M′ and the boundary lines of Z as seeds (once again theseboundary lines may or may not include the boundary of the entire nucleusitself). FIG. 6 shows the result of the preferred method; i.e. theapplication of the watershed transform to the morphological gradient ofthe image I′ of FIG. 3 using the connected components of M′ shown inFIG. 4 and the watershed lines (boundary lines of Z) shown in FIG. 5 asmarkers. FIG. 7 shows the result of the alternative method; i.e. theapplication of seeded region growing to the image I′ of FIG. 3 using theconnected components of M′ shown in FIG. 4 and the watershed lines(boundary lines of Z) shown in FIG. 5 as seeds.

If the input image I is replaced with its photographic negative (for an8-bit gray-scale image with gray-values ranging from 0 to 255 thisamounts to subtracting each pixel's value from 255) then the (dark)particles segmented by the present invention will correspond to lightparticles in the original positive image. FIG. 8 shows the photographicnegative of the image in FIG. 2. FIG. 9 shows the segmentation producedby the preferred embodiment of the present invention when FIG. 8 is usedas the input.

Although the preceding discussion has been based upon two-dimensionalgray-scale images, the inventors envisage that the invention can beapplied to higher dimensional images, and to multispectral and othermulti-valued images. For example, the invention can be applied to colorimages for which each pixel has multiple components. The nature of thesecomponents is dictated by the underlying color model such as HLS (hue,luminance, saturation), YIQ (luminance, in-phase, quadrature), CMY(cyan, magenta, yellow), and RGB (red, green, Blue). HLS, YIQ, CMY, andRGB are all multi-value images. The seeded region growing algorithmsaforementioned readily generalize to three-dimensional gray-scale imagesand to color images. A three-dimensional version of the watershedtransform also exists [Cotsaces and Pitas; Computing the watersheds oflarge three-dimensional images using limited random access memory; inMathematical Morphology and its Applications to Image and SignalProcessing; Dordrecht, The Netherlands; Kluwer Academic Publishers; ISBN0-7923-5133-9; 1998; pp. 239-246].

A characteristic of all of the known chromatin segmentation methods isthat they require the a priori specification of one or more operationalparameters such as threshold values and region merging criteria. Thepresent invention differs in that it is, in its preferred embodiment, aparameter-free method of segmenting chromatin particles. In addition, incomparison to existing methods, the present invention produces adiscernibly better segmentation of nuclear chromatin; i.e. for any givennucleus image the method yields a segmentation of chromatin particlesthat corresponds well with what a human observer might intuitivelyperceive to be blobs or particles.

Persons skilled in the relevant art may realize variations from thepreferred embodiment that will nonetheless fall within the scope of thisinvention. One such variation, for example, is to determine the ZOIswithout rendering boundary lines, and then to segment a blob in each ZOIin turn (sequentially and independently).

The inventors believe that the most important application of the presentinvention is to the development and computation of features thatquantitatively characterize the structure of nuclear chromatin. Inparticular these features can be synthesized from features computed forthe individual chromatin particles including area, perimeter, volume,surface area, average gray-level, circularity, dynamics, number ofneighbors, average distance to neighbors, and distance to nuclearboundary. In turn the inventors believe that such features have two veryimportant applications. The first is that such features can be used todetect changes during neoplasia, and malignancy associated changes. Thesecond is that such features can be used to select cell nuclei withsufficient texture for subsequent analysis using standard texturefeatures (such as those described in the aforementioned patent of Palcicet al.) and/or features computed from the current invention.

The following discussion describes the computation of particle featuresand the synthesis of nucleus features and slide features. The discussionfollows the gray-scale example used above.

Particle Features

In the preferred embodiment of the invention, features are computed forboth the dark and light particles of a nucleus image. The formerparticles are found from a segmentation (using the procedure alreadydescribed) of the nucleus image. The latter are found from asegmentation of the photographic negative of the nucleus image. FIG. 10shows examples of the intermediate images used to compute the preferredfeatures of a particle. FIG. 10 is not to scale. The reduction in scalecan be gauged by comparing the top left image of FIG. 10 with FIG. 3.Each particle (an image object or region) is a distinct set ofgray-level pixels in the nucleus image. The cardinality of each particlecan be represented by a single connected component (binary mask).Numerous features can be found in the literature to characterize thesize, shape, boundary and texture of image objects. Compendia of suchfeatures can be found in [Grohs & Husain (Eds.); Automated cervicalcancer screening; IGAKU-SHOIN Medical Publishers; New York; ISBN:0-89640-255-X; 1994; pp. 24-38] and in the aforementioned United StatesPatent of Palcic et al. Although these compendia describe features forthe purpose of characterizing cell nuclei, the features can be appliedmore generally to any image object, including nuclear particlessegmented by the present invention. In the preferred embodiment of thepresent invention, the following features are computed for eachparticle:

Morphometric Features

These features characterize the size, shape, and boundary of a connectedcomponent. In the preferred embodiment of the present invention, thefollowing morphometric features are computed from the binary mask of aparticle.

1. Area

-   -   This is defined as the number of pixels belonging to the        particle; i.e. the number of pixels in the binary mask of the        particle.

2. Perimeter

-   -   This is computed using the digital version of Crofton's formula        [Serra; Image Analysis and Mathematical Morphology; Volume 1;        Academic Press; London; ISBN 0-12-637240-3; 1982; pp. 220-223].

3. G factor

-   -   This is computed using the method described by Danielsson [A new        shape factor; Computer Graphics and Image Processing; vol. 7;        pp. 292-299; 1978].        Densitometric Features

These features characterize the gray-level (optical density orintensity) variation of the pixels in an image object. These featurescharacterize gray-level variation only (they do not take into accountthe positions of the pixels). In the preferred embodiment of the presentinvention, the image I′ together with the binary mask of a particle areused to compute the following densitometric features.

1. Volume

-   -   This is computed from the set of gray pixels in I′ that        correspond to the binary mask of the particle. It is defined as        the sum of these gray values.

2. Mean Gray Value

-   -   This is computed from the set of gray pixels in I′ that        correspond to the binary mask of the particle. It is defined as        the mean of these gray values.

3. Dynamics

-   -   This is defined to be the dynamics value of the regional minimum        associated with the particle (the value determined during        segmentation).        Texture Features

These features characterize the gray-level (optical density orintensity) variation of the pixels in a gray-level image object, takinginto account the position of the pixels. In the preferred embodiment ofthe present invention, the image I′ together with the binary mask of aparticle are used to compute the following texture feature:

1. Surface Area

-   -   This is computed from the set of gray pixels in the discrete        morphological gradient of I′ that correspond to the binary mask        of the particle. It is defined as the sum of these gray values        [Rivest & Soille; Physical significance of image measurements;        IEEE Transactions on instrumentation and Measurement; vol. 44;        no. 3; June 1995; pp. 751-754].        Contextual Features

These features characterize relationships between one or more imageobjects. In the preferred embodiment of the present invention,relationships between a nucleus and its particles, and relationshipsbetween the particles themselves are characterized. With respect to thefirst case the following feature is computed:

1. Distance to Nucleus Boundary

-   -   This is computed from the set of gray pixels in the Euclidean        distance transform [Mehnert & Jackway; On computing the exact        Euclidean distance transform on rectangular and hexagonal grids;        Journal of Mathematical Imaging and Vision; vol. 11; 1999; pp.        223-230] of the binary mask of the nucleus that correspond to        the binary mask of the particle. It is defined as the mean of        these gray values.

With respect to the second case, contextual features are computed from(i) a neighborhood graph defined on the dark particles; (ii) aneighborhood graph defined on the light particles; and (iii) aneighborhood graph defined on both the light and dark particles. Vincent[Graphs and mathematical morphology; Signal Processing; vol. 16; 1989;pp. 365-388] describes and presents algorithms for a range of suchneighborhood graphs. The preferred type of neighborhood graph is theDelaunay graph. The Delaunay graph for the dark particles of FIG. 10 isshown in FIG. 11. In the preferred embodiment of the invention, thefollowing graph-based features are computed for particles:

2. Number of Dark Particle Neighbors a Dark Particle Has

-   -   This is determined from the neighborhood graph defined on the        dark particles; e.g. it can be seen in FIG. 11 that the dark        particle shown as a binary mask in FIG. 10 has 9 dark particle        neighbors.

3. Mean Distance Between a Dark Particle and its Dark Particle Neighbors

-   -   This is determined from the neighborhood graph defined on the        dark particles. The distance is the centroid-to-centroid        Euclidean distance; e.g. this distance is the length of a graph        edge connecting 2 particles in FIG. 11.

4. Number of Light Particle Neighbors a Light Particle Has

-   -   This is determined from the neighborhood graph defined on the        light particles.

5. Mean Distance Between a Light Particle and its Light ParticleNeighbors

-   -   This is determined from the neighborhood graph defined on the        light particles. The distance is the centroid-to-centroid        Euclidean distance.

6. Number of Dark Particle Neighbors

-   -   This is determined from the neighborhood graph defined on both        the light and dark particles.

7. Mean Distance to Dark Particles

-   -   This is determined from the neighborhood graph defined on both        the light and dark particles. The distance is the        centroid-to-centroid Euclidean distance.

8. Number of Light Particle Neighbors

-   -   This is determined from the neighborhood graph defined on both        the light and dark particles.

9. Mean Distance to Light Particles

-   -   This is determined from the neighborhood graph defined on both        the light and dark particles. The distance is the        centroid-to-centroid Euclidean distance.

10. Number of Particle Neighbors

-   -   This is determined from the neighborhood graph defined on both        the light and dark particles.

11. Mean Distance to Particle Neighbors

-   -    This is determined from the neighborhood graph defined on both        the light and dark particles. The distance is the        centroid-to-centroid Euclidean distance.

Nucleus Features

The particles segmented from a nucleus can be used to compute a varietyof nucleus features. One set of such features is obtained from summarystatistics (including moments and order statistics) computed for eachparticle feature; e.g. mean, median, mode, variance, interquartilerange, skewness, and kurtosis of particle area. These summary statisticscan be computed for the dark particles only, the light particles only,and for both types of particle. In the preferred embodiment of thepresent invention, the following features are computed:

-   1. mean of each particle feature except dynamics;-   2. standard deviation of each particle feature except dynamics;-   3. median dynamics;-   4. interquartile range of dynamics;-   5. number of dark particles; and-   6. number of light particles.

Another set of nucleus features can be computed from (i) the histogramof gray-values in the distance transform of the binary mask representingthe nucleus background between the dark particles (see FIG. 10); (ii)the histogram of gray-values in the distance transform of the binarymask representing the nucleus background between the light particles;and (iii) the histogram of gray-values in the distance transform of thebinary mask representing the nucleus background between both the lightand dark particles. Summary statistics of these histograms yield nucleusfeatures. Alternatively, the method of Russ [The image processinghandbook; Florida; CRC Press; ISBN 0849342333; 1992; p. 336] can be usedto compute a feature for each histogram that characterizes the particleclustering. In the preferred embodiment of the invention, the followingfeatures are computed:

-   7. mean of the histogram obtained from the Euclidean distance    transform of the nucleus background between the dark particles;-   8. standard deviation of the histogram obtained from the Euclidean    distance transform of the nucleus background between the dark    particles;-   9. mean of the histogram obtained from the Euclidean distance    transform of the nucleus background between the light particles;-   10. standard deviation of the histogram obtained from the Euclidean    distance transform of the nucleus background between the light    particles;-   11. mean of the histogram obtained from the Euclidean distance    transform of the nucleus background between both the light and dark    particles; and-   12. standard deviation of the histogram obtained from the Euclidean    distance transform of the nucleus background between both the light    and dark particles.

Several additional sets of nucleus features can be computed from the 3neighborhood graphs aforementioned (one defined on the dark particlesonly, one defined on the light particles only, and one defined on boththe light and dark particles). For a given graph, a co-occurrence matrixcan be defined for each particle feature as described in [Toriwaki &Yokoi; Voronoi and related neighbors on digitized two-dimensional spacewith applications to texture analysis; in Computational morphology: Acomputational geometric approach to the analysis of form; Toussaint(Ed.); Elsevier Science Publishers; Amsterdam; ISBN: 0444704671; 1988;pp. 207-228]. For example, from the histogram of dark particle areas andthe neighborhood graph defined on these particles, it is possible toconstruct a matrix such that the entry in the i-th row and the j-thcolumn represents the number of times a particle of area i is adjacentto a particle of area j. To keep the matrix size manageable or to avoidhaving a sparse matrix the number of bins in the histogram of thefeature under consideration can be reduced. For each co-occurrencematrix, co-occurrence matrix features can be computed and used asnucleus features. A compendium of co-occurrence matrix features can befound in [Walker; Adaptive multi-scale texture analysis: withapplication to automated cytology; PhD Thesis; School of ComputerScience and Electrical Engineering; The University of Queensland;Australia; 1997; Chapter 2]. In the preferred embodiment of theinvention the following features are computed:

-   13. contrast and-   14. energy computed from the co-occurrence matrix obtained from the    neighborhood graph defined on the dark particles, using the particle    feature area. To keep the matrix manageable, the feature area is    binned into two classes: (i) small containing particles with an area    less than ⅙ of the nucleus area and (ii) large containing all other    particles.    Slide Features Derived From Nucleus Features

FIG. 12 (not to scale) shows a cytology slide, an image of a singlefield of view from the slide taken by a CCD camera mounted on amicroscope, and one of the nuclei from the field of view. Features canbe synthesized for a slide (be it a cytology slide or a histology slide)by computing summary statistics (including moments and order statistics)for each feature for all or a subset of nuclei obtained from the slide.In the preferred embodiment of the invention the mean and standarddeviation are computed for each nucleus feature from a statisticallyrepresentative number of nuclei extracted from a slide.

Slide Features Derived From particle Features

Slide features can also be synthesized by computing summary statistics(including moments and order statistics) directly from particle featurescomputed for all or a subset of nuclei extracted from a slide. In thepreferred embodiment of the invention the mean and standard deviationare computed for each particle feature from a statisticallyrepresentative number of nuclei extracted from a slide.

A person skilled in the relevant art may realize possible variations tothe described process that will nonetheless fall within the scope ofthis invention. One such variation is to use different estimators thanthose described to compute features such as perimeter, e.g. particleperimeter could be computed using the method described in theaforementioned patent of Palcic et al. for computing the perimeter of anucleus. Another variation is to employ a different type of neighborhoodgraph (e.g. a Gabriel graph) for the purpose of computing features.Another variation is to compute graph-based features using k-adjacencyrather than just 1-adjacency; i.e. 2 particles are considered to bek-neighbors if a sequence of k edges connects them in the neighborhoodgraph. Another variation is that a region adjacency graph can be definedon the zones of influence image Z, computed during segmentation, andthat this be used in place of the neighborhood graph for the purpose ofcomputing features. Another variation is that nucleus features can becomputed from the set of lengths of shared borders between pairs ofzones of influence. Another variation is that nucleus features can becomputed from the histogram of centroid-to-centroid distances for pairsof neighboring particles (as defined by a neighborhood graph). Anothervariation is that the segmentation of the nucleus into dark particlesand light particles creates a partition of the nucleus pixels into 3sets and that these 3 sets can be used to compute some of the featuresthat Young, Verbeek, and Mayall compute for their method of chromatinsegmentation described earlier (note that the exact meaning and value ofthese will be completely different for the present invention). Anothervariation is that-nucleus features can be computed from histograms ofparticle features for particles that are adjacent to the nucleusboundary. Another variation is to compute particle features directlyfrom the nucleus image I rather than I′ (in which case the particlemasks must be down-sampled or the image I up-sampled). Another variationis that features can be normalized; e.g. particle area can be expressedas a percentage of the nucleus area. Another variation is that distancesother than Euclidean distance can be used to compute features; e.g.chessboard distance, city-block distance, 5-7-chamfer distance[Heijmans; Morphological image operators; Academic Press; San Diego;ISBN: 0-12-014599-5; 1994; pp. 325-331].

It is an intention of the inventors that the features computed using thepresent invention be used in a computerized system (an automatedcytometer and classifier) for the purpose of detecting neoplasia andmalignancy associated changes in cells and tissue.

EXAMPLE

The present invention has been applied to hundreds of thousands ofimages of cell nuclei, or objects that resemble nuclei, obtained from148 cytology slides using an automated cytometer. FIG. 13 shows a smallgallery of nuclei for which the dark particles have been segmented bythe invention. Each slide was prepared from a cervical swab taken from apatient. The set of slides include only one slide per patient. Theslides were prepared using a liquid-based preparation technique, andstained using the Papanicolaou staining protocol. Each slide has beenclassified by a pathologist as either negative (showing no signs ofcervical intraepithelial neoplasia (CIN)), or positive (showing signs ofCIN). In the latter case the pathologist has indicated the relativedegree of abnormality: CIN 1, CIN 2, or CIN 3. Of the 148 slides, 101are negative. The methods used by the cytometer to obtain the images ofnuclei, or objects resembling nuclei, are documented in PCT PatentApplication number PCT/AU01/00787 and PCT Patent Application numberPCT/AU99/00231.

The cytometer was programmed to extract images of at most 10000 objectsresembling nuclei (in terms of size, shape, and intensity) from eachslide. A large number of objects found by the cytometer are artifactssuch as blood, mucous, dust, poorly segmented nuclei, and out-of-focusimages of nuclei. The invention was used to perform artifact rejectionas follows. All objects that contained fewer than 5 dark particles, orthat contained very large dark particles (>⅓ of the area of the nucleus)were considered to be artifacts and rejected (too few dark particles, orextremely large dark particles, is indicative of poorly segmented orpoorly focused nuclei, or an image artifact such as dust, blood, ormucous).

The following features were computed and recorded for each nucleus:

(Note: sdev means standard deviation in the following tables) Deriveddirectly from the binary mask of the nucleus nucleus_areanucleus_perimeter

Derived from the invention nucleus_mean_dark_particle_areanucleus_sdev_dark_particle_area nucleus_mean_dark_particle_perimeternucleus_sdev_dark_particle_perimeternucleus_median_dark_particle_dynamics nucleus_IQR_dark_particle_dynamicsnucleus_mean_dark_particle_distance_to_nucleus_boundarynucleus_sdev_dark_particle_distance_to_nucleus_boundarynucleus_total_dark_particle_area nucleus_number_of_dark_particlesratio_of_total_dark_particle_area_to_nucleus_area

Means and standard deviations of these nucleus features yielded thefollowing slide features: Slide features derived from nucleus featuresslide_mean_of_nucleus_area slide_sdev_of_nucleus_areaslide_mean_of_nucleus_perimeter slide_sdev_of_nucleus_perimeterslide_mean_of_nucleus_mean_dark_particle_areaslide_sdev_of_nucleus_mean_dark_particle_areaslide_mean_of_nucleus_sdev_dark_particle_areaslide_sdev_of_nucleus_sdev_dark_particle_areaslide_mean_of_nucleus_mean_dark_particle_perimeterslide_sdev_of_nucleus_mean_dark_particle_perimeterslide_mean_of_nucleus_sdev_dark_particle_perimeterslide_sdev_of_nucleus_sdev_dark_particle_perimeterslide_mean_of_nucleus_median_dark_particle_dynamicsslide_sdev_of_nucleus_median_dark_particle_dynamicsslide_mean_of_nucleus_IQR_dark_particle_dynamicsslide_sdev_of_nucleus_IQR_dark_particle_dynamicsslide_mean_of_nucleus_mean_dark_particle_distance_to_(—)nucleus_boundaryslide_sdev_of_nucleus_mean_dark_particle_distance_to_(—)nucleus_boundaryslide_mean_of_nucleus_sdev_dark_particle_distance_to_(—)nucleus_boundaryslide_sdev_of_nucleus_sdev_dark_particle_distance_to_(—)nucleus_boundary slide_mean_of_nucleus_total_dark_particle_areaslide_sdev_of_nucleus_total_dark_particle_areaslide_mean_of_nucleus_number_of_dark_particlesslide_sdev_of_nucleus_number_of_dark_particlesslide_mean_of_ratio_of_total_dark_particle_area_to_(—) nucleus_areaslide_sdev_of_ratio_of_total_dark_particle_area_to_(—) nucleus_area

These 26 slide features, along with the pathologist classifications ofnegative and positive, were used to train and test a 2-class statisticalclassifier with the R statistical analysis software [R is free softwarefrom the GNU project www.gnu.org; Homepage ishttp://www.R-project.org/]. The training phase included featureselection (i.e. determining which features are useful for discriminatingbetween the 2 classes). The training was done on a subset of the featuredata made up of 70% of the negatives and 70% of the positives chosen atrandom. The remaining positives and negatives were used to test theclassifier. The performance of the classifier was recorded in terms ofthe area under the receiver operating characteristic (ROC) curve. Aclassifier for which the area under the ROC curve (AUC) is 0.5classifies randomly. A classifier for which the AUC is 1 is a perfectclassifier. The train/test procedure was independently repeated 100times (each time randomly splitting the positives and the negatives inthe ratio 7:3 for the training and testing data respectively). From the100 different trials, the following 7 features were selected more than70% of the time:

-   1. slide_mean_of_nucleus_number_of_dark_particles-   2. slide_mean_of_nucleus_perimeter-   3. slide_mean_of_nucleus_mean_dark_particle_area-   4. slide_sdev_of_nucleus_mean_dark_particle_area-   5. slide_sdev_of_nucleus_sdev_dark_particle_area-   6.    slide_sdev_of_nucleus_mean_dark_particle_distance_to_nucleus_boundary-   7.    slide_sdev_of_nucleus_sdev_dark_particle_distance_to_nucleus_boundary

From another 100 trials with only the 7 features above, the following 4features were selected more than 75% of the time:

-   1. slide_mean_of_nucleus_number_of_dark_particles-   2. slide_mean_of_nucleus_perimeter-   3. slide_mean_of_nucleus_mean_dark_particle_area-   4.    slide_sdev_of_nucleus_mean_dark_particle_distance_to_nucleus_boundary

With these 4 features the following estimate of the area under the ROCcurve was obtained:78%±0.06 (mean±standard deviation)

This indicates that features that summarize the size and distance to thenuclear boundary of the dark particles (condensed chromatin) in thenucleus are good features for detecting MACs. Moreover these featurescorrespond well to descriptive terms like clumping and margination usedby cytologists to describe chromatin texture.

Throughout the preceding specification the aim has been to describe thepreferred embodiment of the present invention without limiting theinvention to any one embodiment.

1. A method of segmenting chromatin particles in the nucleus of a cellincluding the following steps of: (i) locating regional minima in animage; (ii) computing a zone of influence (ZOI) around each regionalminimum; and (iii) segmenting a single chromatin blob within each ZOIusing a region growing procedure.
 2. The method of claim 1 wherein theimage is a two-dimensional gray-scale image comprising only those pixelsthat define the nucleus of a cell.
 3. The method of claim 1 wherein theimage is a three-dimensional image.
 4. The method of claim 1 wherein theimage is a multi-valued image.
 5. The method of claim 1 furtherincluding the step of evaluating the contrast of each regional minimumand discarding those regional minima that do not satisfy a priorispecified contrast criteria.
 6. The method of claim 1 further includinga preliminary step of pre-processing the image to correct fordegradations.
 7. The method of claim 6 wherein the pre-processing stepremoves degradations such as noise and blurring.
 8. The method of claim6 wherein the pre-processed image is up-sampled.
 9. The method of claim1 wherein the regional minima are regions of constant gray-value thatare surrounded by pixels of strictly higher (lighter) gray-value, toidentify dark blobs.
 10. The method of claim 1 wherein the step ofcomputing a zone of influence is performed by means of region growing orthe computation of influence zones (IZs).
 11. The method of claim 1wherein the segmentation step is performed by means of one of awatershed transform or a seeded region growing algorithm.
 12. A methodof quantitatively characterizing the distribution of nuclear chromatinincluding the steps of: (i) obtaining an image of a cell nucleus showingchromatin texture; (ii) locating regional minima in the image; (iii)computing a zone of influence (ZOI) around each regional minimum; (iv)segmenting a single chromatin blob within each ZOI using a regiongrowing procedure; and (v) computing features for individual chromatinblobs.
 13. The method of claim 12 further including the step of: (vi)synthesizing chromatin texture features from the features computed forindividual chromatin blob features.
 14. The method of claim 13 includingthe further step of repeating steps (ii) to (v) for the negative of theimage of step (i) and using the computed features in step (vi).
 15. Themethod of claim 12 further including the step of evaluating the contrastof each regional minimum in the image of the cell nucleus and discardingthose regional minima that do not satisfy a priori specified contrastcriteria.
 16. The method of claim 12 wherein the image of the cellnucleus is a gray-scale image.
 17. The method of claim 12 wherein thecomputed features are selected from morphometric features, densitometricfeatures, texture features, and contextual features.
 18. The method ofclaim 17 wherein the morphometric features include one or more of: area;perimeter; and G factor.
 19. The method of claim 17 wherein thedensitometric features include one or more of: volume; mean gray value;and dynamics.
 20. The method of claim 17 wherein the contextual featuresinclude one or more of: distance to nucleus boundary; number of darkparticle neighbors a dark particle has; mean distance between a darkparticle and its dark particle neighbors; number of light particleneighbors a light particle has; mean distance between a light particleand its light particle neighbors; number of dark particle neighbors;mean distance to dark particles; number of light particle neighbors;mean distance to light particles; number of particle neighbors; and meandistance to particle neighbors.
 21. The method of claim 17 wherein thesynthesized features are selected from nucleus features, slide featuresderived from nucleus features, and slide features derived from particlefeatures.
 22. The method of claim 21 wherein the nucleus featuresinclude one or more of: mean of each particle feature except dynamics;standard deviation of each particle feature except dynamics; mediandynamics; interquartile range of dynamics; number of dark particles;number of light particles; mean of the histogram obtained from theEuclidean distance transform of the nucleus background between the darkparticles; standard deviation of the histogram obtained from theEuclidean distance transform of the nucleus background between the darkparticles; mean of the histogram obtained from the Euclidean distancetransform of the nucleus background between the light particles;standard deviation of the histogram obtained from the Euclidean distancetransform of the nucleus background between the light particles; mean ofthe histogram obtained from the Euclidean distance transform of thenucleus background between both the light and dark particles; standarddeviation of the histogram obtained from the Euclidean distancetransform of the nucleus background between both the light and darkparticles; contrast; and energy computed from the co-occurrence matrixobtained from the neighborhood graph defined on the dark particles,using the particle feature area.